Introduction to Seaborn
Seaborn is a Python data visualization library based on Matplotlib that provides a high-level interface for creating attractive and informative statistical graphics. It integrates seamlessly with pandas data structures and offers built-in themes for professional-looking plots.
Beautiful Defaults
Attractive default styles and color palettes
Statistical Plots
Built-in support for complex visualizations
Pandas Integration
Works directly with DataFrames
High-Level API
Less code for complex visualizations
Installation & Setup
Install Seaborn
# Install via pip
pip install seaborn
# Install with conda
conda install seaborn
Import and Basic Setup
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# Set the style
sns.set_theme()
# Or set specific style
sns.set_style("whitegrid")
💡 Built-in Datasets
Seaborn includes several built-in datasets perfect for learning and testing. Access them using
sns.load_dataset('dataset_name').
Basic Plotting Concepts
Loading Data
# Load built-in dataset
tips = sns.load_dataset("tips")
iris = sns.load_dataset("iris")
titanic = sns.load_dataset("titanic")
# View first few rows
print(tips.head())
Figure-Level vs Axes-Level Functions
Understanding Seaborn's Interface
- Axes-level: Functions like
scatterplot(),lineplot()- plot on a single matplotlib axes - Figure-level: Functions like
relplot(),catplot()- create entire figure with multiple subplots
# Axes-level function
sns.scatterplot(data=tips, x="total_bill", y="tip")
plt.show()
# Figure-level function
sns.relplot(data=tips, x="total_bill", y="tip",
hue="smoker", col="time")
plt.show()
Distribution Plots
Histograms and KDE
# Histogram
sns.histplot(data=tips, x="total_bill")
# Histogram with KDE overlay
sns.histplot(data=tips, x="total_bill", kde=True)
# KDE plot
sns.kdeplot(data=tips, x="total_bill")
# Multiple distributions
sns.kdeplot(data=tips, x="total_bill", hue="time")
# 2D KDE (bivariate)
sns.kdeplot(data=tips, x="total_bill", y="tip")
Distribution Plot (displot)
# Figure-level distribution plot
sns.displot(data=tips, x="total_bill", hue="time",
kind="kde", fill=True)
# Histogram with facets
sns.displot(data=tips, x="total_bill", col="time",
row="sex", kde=True)
# ECDF plot
sns.displot(data=tips, x="total_bill", kind="ecdf")
Rug Plot
# Add rug plot to show individual observations
sns.kdeplot(data=tips, x="total_bill")
sns.rugplot(data=tips, x="total_bill")
Categorical Plots
Box and Violin Plots
# Box plot
sns.boxplot(data=tips, x="day", y="total_bill")
# Box plot with hue
sns.boxplot(data=tips, x="day", y="total_bill", hue="smoker")
# Violin plot (shows distribution)
sns.violinplot(data=tips, x="day", y="total_bill")
# Split violin plot
sns.violinplot(data=tips, x="day", y="total_bill",
hue="sex", split=True)
Strip and Swarm Plots
# Strip plot (scattered points)
sns.stripplot(data=tips, x="day", y="total_bill")
# Swarm plot (non-overlapping points)
sns.swarmplot(data=tips, x="day", y="total_bill")
# Combine box plot with swarm plot
sns.boxplot(data=tips, x="day", y="total_bill")
sns.swarmplot(data=tips, x="day", y="total_bill",
color="black", alpha=0.5)
Bar and Count Plots
# Bar plot (shows mean with confidence interval)
sns.barplot(data=tips, x="day", y="total_bill")
# Count plot (counts occurrences)
sns.countplot(data=tips, x="day")
# Count plot with hue
sns.countplot(data=tips, x="day", hue="sex")
Point Plot
# Point plot (shows mean and CI as points and lines)
sns.pointplot(data=tips, x="day", y="total_bill", hue="sex")
Categorical Plot (catplot)
# Figure-level categorical plot
sns.catplot(data=tips, x="day", y="total_bill",
kind="box", col="time")
# Violin plot with facets
sns.catplot(data=tips, x="day", y="total_bill",
kind="violin", hue="sex", col="time")
Relational Plots
Scatter Plots
# Basic scatter plot
sns.scatterplot(data=tips, x="total_bill", y="tip")
# With hue (color)
sns.scatterplot(data=tips, x="total_bill", y="tip", hue="time")
# With size
sns.scatterplot(data=tips, x="total_bill", y="tip",
size="size", hue="day")
# With style (marker)
sns.scatterplot(data=tips, x="total_bill", y="tip",
style="smoker", hue="day")
Line Plots
# Load time series data
flights = sns.load_dataset("flights")
# Basic line plot
sns.lineplot(data=flights, x="year", y="passengers")
# Multiple lines with hue
sns.lineplot(data=flights, x="month", y="passengers",
hue="year")
# Aggregated line plot with CI
fmri = sns.load_dataset("fmri")
sns.lineplot(data=fmri, x="timepoint", y="signal",
hue="event")
Relational Plot (relplot)
# Figure-level relational plot
sns.relplot(data=tips, x="total_bill", y="tip",
hue="smoker", col="time", row="sex")
# Line plot with facets
sns.relplot(data=flights, x="year", y="passengers",
kind="line", col="month", col_wrap=4)
Matrix and Heatmaps
Heatmap
# Create correlation matrix
corr = tips.corr(numeric_only=True)
# Basic heatmap
sns.heatmap(corr)
# Heatmap with annotations
sns.heatmap(corr, annot=True, fmt=".2f",
cmap